324 research outputs found
On Cognitive Preferences and the Plausibility of Rule-based Models
It is conventional wisdom in machine learning and data mining that logical
models such as rule sets are more interpretable than other models, and that
among such rule-based models, simpler models are more interpretable than more
complex ones. In this position paper, we question this latter assumption by
focusing on one particular aspect of interpretability, namely the plausibility
of models. Roughly speaking, we equate the plausibility of a model with the
likeliness that a user accepts it as an explanation for a prediction. In
particular, we argue that, all other things being equal, longer explanations
may be more convincing than shorter ones, and that the predominant bias for
shorter models, which is typically necessary for learning powerful
discriminative models, may not be suitable when it comes to user acceptance of
the learned models. To that end, we first recapitulate evidence for and against
this postulate, and then report the results of an evaluation in a
crowd-sourcing study based on about 3.000 judgments. The results do not reveal
a strong preference for simple rules, whereas we can observe a weak preference
for longer rules in some domains. We then relate these results to well-known
cognitive biases such as the conjunction fallacy, the representative heuristic,
or the recogition heuristic, and investigate their relation to rule length and
plausibility.Comment: V4: Another rewrite of section on interpretability to clarify focus
on plausibility and relation to interpretability, comprehensibility, and
justifiabilit
More is not Always Better: The Negative Impact of A-box Materialization on RDF2vec Knowledge Graph Embeddings
RDF2vec is an embedding technique for representing knowledge graph entities
in a continuous vector space. In this paper, we investigate the effect of
materializing implicit A-box axioms induced by subproperties, as well as
symmetric and transitive properties. While it might be a reasonable assumption
that such a materialization before computing embeddings might lead to better
embeddings, we conduct a set of experiments on DBpedia which demonstrate that
the materialization actually has a negative effect on the performance of
RDF2vec. In our analysis, we argue that despite the huge body of work devoted
on completing missing information in knowledge graphs, such missing implicit
information is actually a signal, not a defect, and we show examples
illustrating that assumption.Comment: Accepted at the Workshop on Combining Symbolic and Sub-symbolic
methods and their Applications (CSSA 2020
A Semantic Browser for Linked Open Data
Although the Semantic Web was originally designed as a "web for machines", the growing wealth of information in Linked Open Data has become interesting for human users as well. Consequently, quite a few browsers for Linked Open Data have recently been developed.
However, despite being developed for the semantic web, those browsers often present alphabetically ordered lists of facts, without respecting the semantics of the data.
In our submission to the Semantic Web Challenge, we present a semantic
browser for the semantic web 1, which aims at presenting facts from Linked Open Data in semantically coherent groups. This paper introduces the main algorithms as well as an evaluation of the browser with
end users
The Time Traveler's Guide to Semantic Web Research: Analyzing Fictitious Research Themes in the ESWC "Next 20 Years" Track
What will Semantic Web research focus on in 20 years from now? We asked this
question to the community and collected their visions in the "Next 20 years"
track of ESWC 2023. We challenged the participants to submit "future" research
papers, as if they were submitting to the 2043 edition of the conference. The
submissions - entirely fictitious - were expected to be full scientific papers,
with research questions, state of the art references, experimental results and
future work, with the goal to get an idea of the research agenda for the late
2040s and early 2050s. We received ten submissions, eight of which were
accepted for presentation at the conference, that mixed serious ideas of
potential future research themes and discussion topics with some fun and irony.
In this paper, we intend to provide a survey of those "science fiction"
papers, considering the emerging research themes and topics, analysing the
research methods applied by the authors in these very special submissions, and
investigating also the most fictitious parts (e.g., neologisms, fabricated
references). Our goal is twofold: on the one hand, we investigate what this
special track tells us about the Semantic Web community and, on the other hand,
we aim at getting some insights on future research practices and directions.Comment: 13 pages, 8 figures, 2 table
Transformer-based Subject Entity Detection in Wikipedia Listings
In tasks like question answering or text summarisation, it is essential to
have background knowledge about the relevant entities. The information about
entities - in particular, about long-tail or emerging entities - in publicly
available knowledge graphs like DBpedia or CaLiGraph is far from complete. In
this paper, we present an approach that exploits the semi-structured nature of
listings (like enumerations and tables) to identify the main entities of the
listing items (i.e., of entries and rows). These entities, which we call
subject entities, can be used to increase the coverage of knowledge graphs. Our
approach uses a transformer network to identify subject entities at the
token-level and surpasses an existing approach in terms of performance while
being bound by fewer limitations. Due to a flexible input format, it is
applicable to any kind of listing and is, unlike prior work, not dependent on
entity boundaries as input. We demonstrate our approach by applying it to the
complete Wikipedia corpus and extracting 40 million mentions of subject
entities with an estimated precision of 71% and recall of 77%. The results are
incorporated in the most recent version of CaLiGraph.Comment: Published at Deep Learning for Knowledge Graphs workshop (DL4KG) at
International Semantic Web Conference 2022 (ISWC 2022
NASTyLinker: NIL-Aware Scalable Transformer-based Entity Linker
Entity Linking (EL) is the task of detecting mentions of entities in text and
disambiguating them to a reference knowledge base. Most prevalent EL approaches
assume that the reference knowledge base is complete. In practice, however, it
is necessary to deal with the case of linking to an entity that is not
contained in the knowledge base (NIL entity). Recent works have shown that,
instead of focusing only on affinities between mentions and entities,
considering inter-mention affinities can be used to represent NIL entities by
producing clusters of mentions. At the same time, inter-mention affinities can
help to substantially improve linking performance for known entities. With
NASTyLinker, we introduce an EL approach that is aware of NIL entities and
produces corresponding mention clusters while maintaining high linking
performance for known entities. The approach clusters mentions and entities
based on dense representations from Transformers and resolves conflicts (if
more than one entity is assigned to a cluster) by computing transitive
mention-entity affinities. We show the effectiveness and scalability of
NASTyLinker on NILK, a dataset that is explicitly constructed to evaluate EL
with respect to NIL entities. Further, we apply the presented approach to an
actual EL task, namely to knowledge graph population by linking entities in
Wikipedia listings, and provide an analysis of the outcome.Comment: Preprint of a paper in the research track of the 20th Extended
Semantic Web Conference (ESWC'23
- …